WOMBAT 2025 Tutorial

Visualising Uncertainty

Harriet Mason, Dianne Cook

Department of Econometrics and Business Statistics

Introduction to Spatial Visualisation

Why focus on spatial visualisations?

  • Spatial case is a good example to work through because the aesthetics we have to express estimates are limited
  • Maps take up most of the usual aesthetics by being a representation of space
    • position, size, shape, etc all have an implicit meaning in the mapping context
    • colour/fill is usually the only aesthetic we have left
    • can also get creative and do glyph maps (we will ignore this variation here)
  • Once we have filled in a map, colour/fill is often the only aesthetic that has

Citizen Scientist Data

  • There have been reports of a strange spatial pattern in the temperatures of Iowa
  • We get some citizen scientists to measure data at their home and report back
  • To maintain anonymity, we are only provided with the county of each scientist
scientistID county_name recorded_temp
#74991 Lyon County 21.1
#22780 Dubuque County 28.9
#55325 Crawford County 26.4
#46379 Allamakee County 27.1
#84259 Jones County 34.2

990 citizen scientists participated

We could just plot the data…

  • We often get spatial data in terms of longitude and latitude which we can plot directly
  • This approach is easy but lacks the contextual information that gives our plots meaning.

Spatial features objects

  • SF objects are differentiated from a tibble because of additional metata in the Coordinate reference system (CRS). Specifically:
    • Assumptions about the shape of the planet (geodetic datum)
    • Distortions we will/won’t accept when drawing the map (map projection)

Can you see the spatial trend?

Estimate the county mean

  • Visualising an estimate, such as a mean, can make trends easier to see
    • Should use the sampling distribution, but often we do not bother…
Code
# Calculate County Mean
toy_temp |> 
  group_by(county_name) |>
  summarise(temp_mean = mean(recorded_temp),
            temp_se = sd(recorded_temp)/sqrt(n()),
            n = n()) 
county_name temp_mean temp_se n
Adair County 29.7 0.907 6
Adams County 29.6 1.003 9
Allamakee County 26.3 0.550 8
Appanoose County 22.8 0.831 14
Audubon County 27.6 0.893 11

Can you see the trend now?

Common Map Visualisations

  • Usually spatial data is shown using a choropleth map
    • Choropleth maps shade an area according to an average or total
  • We can also weight according to a different variable (such as sample size)
    • e.g. Cartograms, and Bubble plots

But what if the error is worse?

  • It turns out the citizen scientists are using some pretty old tools
  • The standard error could be up to three times what we would estimate with our usual assumptions.
  • We want to see both versions of the data so we can see the impact of this measurement error
county_name temp_mean low_temp_se high_temp_se n county_geometry
Adair County 29.7 0.907 2.72 6 MULTIPOLYGON (((441130 -374...
Adams County 29.6 1.003 3.01 9 MULTIPOLYGON (((424556 -414...
Allamakee County 26.3 0.550 1.65 8 MULTIPOLYGON (((675217 -131...

Spot the difference

  • One of these plots was made with the high standard error data, and the other was made with the low standard error data. Can you tell which is which?

Exercise

Make the high and low variance choropleth maps yourself, and see why they come out looking identical

Approaches to Spatial Uncertainty

Solution 1: add an axis for uncertainty

  • Pro
    • Included uncertainty and increased transparency
  • Cons
    • High uncertainty signal still very visible
    • 2D palette is harder to read
      • Colour is not a simple 3D space
      • Using saturation hurts accessibility

Solution 2: blend the colours together

  • Pros
    • Included uncertainty and increased transparency
    • Removed false signals
  • Cons
    • Still have 2D Colour palette
    • Standard error at which to blend colours is made up
      • Impossible to align with hypothesis testing

Solution 3: simulate a sample

  • Pros
    • Included uncertainty
    • High uncertainty interferes with reading of plot (?)
    • 1D colour palette

Alternative software for incorperating uncertainty

  • Existing tidy data structures are not great for uncertain data
  • e.g. Vizumap
    • Makes Bivariate maps and Pixel (sample) maps
    • Package is designed specifically for uncertainty
  • Issues
    • ggplot2 flexibility is lost
      • e.g. you can only use one of three specific palettes
    • Very computationally expensive
      • A simple map can take over a minute to run
    • Need to make every component separately then combine

Making a Pixel Map with ggdibbler

ggplot2 uses the grammar of graphics

It is designed to take in data

Not theoretical distributions

This is what ggdibbler is for

Basic ggdibbler Example

Code
library(ggdibbler)
toy_temp_dist |> 
  ggplot() + 
  geom_sf_sample(aes(geometry = county_geometry,
                     fill=temp_dist))

Can utilise ggplot2 flexibility

Code
ggplot(toy_temp_dist) +
  geom_sf_sample(aes(geometry=county_geometry, fill=temp_dist),  linewidth=0, n=7) +
  geom_sf(aes(geometry = county_geometry), fill=NA, linewidth=0.5, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  ggtitle("A super cool and customised plot")

Remember, the plot is random

Code
ggplot(toy_temp_dist) +
  geom_sf_sample(aes(geometry=county_geometry, fill=temp_dist),  linewidth=0, n=7) +
  geom_sf(aes(geometry = county_geometry), fill=NA, linewidth=0.5, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  ggtitle("A super cool and customised plot")

Exercise

Here is the code that was used to make the cartogram from earlier in the session. Can you make a ggdibbler verion of this plot?

Code
# Transform to a the crs needed to do the cartogram transformation
toy_merc <- st_transform(toy_temp_mean, 3857)
# cartogram transformation
toy_cartogram <- cartogram_cont(toy_merc, weight = "n", itermax = 5)
# Transform back to original crs 
toy_cartogram <- st_transform(toy_cartogram, st_crs(toy_temp_mean))

# Plot cartogram using ggplot2
ggplot(toy_cartogram) +
  geom_sf(aes(fill = temp_mean), linewidth = 0, alpha = 0.9) +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Solution

Code
# only change to data is distribution
toy_cartogram |>
  mutate(temp_dist = distributional::dist_normal(temp_mean, temp_se^2)) |>
  ggplot() +
  geom_sf_sample(aes(geometry=county_geometry, 
                     fill=temp_dist), linewidth=0) +
   geom_sf(aes(geometry=county_geometry), fill=NA, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)